Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Port support for commercial GeoIP2 databases from Logstash. #24889

Merged
merged 4 commits into from
Jun 13, 2017

Conversation

nezirus
Copy link
Contributor

@nezirus nezirus commented May 25, 2017

Port support for commercial GeoIP2 databases from Logstash, City and Country for now. If there is interest, I'll submit ISP database in separate pull request.

@elasticmachine
Copy link
Collaborator

Since this is a community submitted pull request, a Jenkins build has not been kicked off automatically. Can an Elastic organization member please verify the contents of this patch and then kick off a build manually?

@talevy talevy added :Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP :Plugin Ingest GeoIp labels May 25, 2017
@talevy
Copy link
Contributor

talevy commented May 25, 2017

jenkins please test this

@talevy
Copy link
Contributor

talevy commented May 25, 2017

thanks for this fix @nezirus, Unfortunately, we have no good way to test this against a read database. mind commenting on your configuration and experience when running with this fix and a non-lite version of the database?

@nezirus
Copy link
Contributor Author

nezirus commented May 29, 2017

We use it exactly the same like non commercial databases (minus the explicit 'database_file' option):

So, after installing the plugin on all nodes in cluster, copying (gzipped) commercial databases, we update the pipeline (actual grok rules ommited):

PUT _ingest/pipeline/geoip-test
{
   "description": "geoip-test",
   "processors": [
      {
         "grok": {...},
         "geoip": {
            "field": "client_ip",
            "database_file": "GeoIP2-City.mmdb.gz"
         }
      }
   ]
}

This is still young, but we haven't noticed any issues or noticed errors in logs.

@nezirus
Copy link
Contributor Author

nezirus commented Jun 7, 2017

Hello guys, have you had the time to review this. We're running this for the last week or so without any noticeable problems (terabytes of data have already passed through the system).

Btw, for performance reasons, we're running it with 'ingest.geoip.cache_size' > 100k.

@talevy
Copy link
Contributor

talevy commented Jun 9, 2017

thanks for following up @nezirus,

after thinking about how we do not have test infrastructure to verify the unique references to the commercial databaseTypes, I think it would make more sense to move from the switch statements on dbReader.getMetadata().getDatabaseType() and the other equals checks to if statements with endsWith("-City") and endsWith("-Country"). this way the code coverage is complete, and it would enable the flexibility of supporting the commercial DB types.

let me know if that sounds reasonable to you!

@nezirus
Copy link
Contributor Author

nezirus commented Jun 12, 2017

Sure, that makes sense, I've updated my branch.

@talevy
Copy link
Contributor

talevy commented Jun 12, 2017

💪

Jenkins, please test this

Copy link
Member

@rjernst rjernst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just one nit.

@@ -57,8 +57,8 @@
public final class GeoIpProcessor extends AbstractProcessor {

public static final String TYPE = "geoip";
private static final String CITY_DB_TYPE = "GeoLite2-City";
private static final String COUNTRY_DB_TYPE = "GeoLite2-Country";
private static final String CITY_DB_TYPE = "-City";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is now a suffix, can you change the constant name to reflect that, eg CITY_DB_SUFFIX? And same with country as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we go, renamed in my branch.

Copy link
Contributor

@talevy talevy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sweet!
LGTM

@talevy talevy merged commit 82897e2 into elastic:master Jun 13, 2017
talevy pushed a commit that referenced this pull request Jun 13, 2017
* Port support for commercial GeoIP2 databases from Logstash.

* Match GeoIP databases according to the database name suffix.

* Rename CITY/COUNTRY_DB_TYPE, since they are suffixes now.
@talevy
Copy link
Contributor

talevy commented Jun 13, 2017

merged these changes into master for the 6.0 release, as well as 5.x for the 5.6 release!

@talevy talevy removed the review label Jun 13, 2017
jasontedor added a commit to jasontedor/elasticsearch that referenced this pull request Jun 14, 2017
* master: (27 commits)
  Refactor TransportShardBulkAction.executeUpdateRequest and add tests
  Make sure range queries are correctly profiled. (elastic#25108)
  Test: allow setting socket timeout for rest client (elastic#25221)
  Migration docs for elastic#25080 (elastic#25218)
  Remove `discovery.type` BWC layer from the EC2/Azure/GCE plugins elastic#25080
  When stopping via systemd only kill the JVM, not its control group (elastic#25195)
  Remove PrefixAnalyzer, because it is no longer used.
  Internal: Remove Strings.cleanPath (elastic#25209)
  Docs: Add note about which secure settings are valid (elastic#25212)
  Indices.rollover/10_basic should refresh to make the doc visible in lucene stats
  Port support for commercial GeoIP2 databases from Logstash. (elastic#24889)
  [DOCS] Add ML node to node.asciidoc (elastic#24495)
  expose simple pattern tokenizers (elastic#25159)
  Test: add setting to change request timeout for rest client (elastic#25201)
  Fix secure repository-hdfs tests on JDK 9
  Add target_field parameter to gsub, join, lowercase, sort, split, trim, uppercase (elastic#24133)
  Add Cross Cluster Search support for scroll searches (elastic#25094)
  Adapt skip version in rest-api-spec/test/indices.rollover/20_max_doc_condition.yml
  Rollover max docs should only count primaries (elastic#24977)
  Add remote cluster infrastructure to fetch discovery nodes. (elastic#25123)
  ...
@clintongormley clintongormley added :Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP and removed :Plugin Ingest GeoIp labels Feb 13, 2018
@omnidepp
Copy link

Could the commercial ISP-database (GeoIP2-ISP.mmdb) please be ported as well? It's working great with logstash, but I'm trying to eliminate logstash in my workflow entirely.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP >enhancement v5.6.0 v6.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants